# Webpage Parsing
Pix2struct Large
Apache-2.0
Pix2Struct is an image encoder-text decoder model trained on image-text pairs, suitable for various vision-language tasks
Image-to-Text
Transformers Supports Multiple Languages

P
google
6,601
34
Pix2struct Ocrvqa Base
Apache-2.0
Pix2Struct is a visual question answering model fine-tuned for OCR-VQA tasks, capable of parsing textual content in images and answering questions
Image-to-Text
Transformers Supports Multiple Languages

P
google
38
1
Pix2struct Docvqa Base
Apache-2.0
Pix2Struct is an image encoder-text decoder model trained on image-text pairs, supporting various tasks including image captioning and visual question answering.
Image-to-Text
Transformers Supports Multiple Languages

P
google
8,601
37
Pix2struct Base
Apache-2.0
Pix2Struct is an image encoder-text decoder model trained on various image-text pairs for tasks including image captioning and visual question answering.
Image-to-Text
Transformers Supports Multiple Languages

P
google
6,390
71
Featured Recommended AI Models